Spatial data brings a whole set of new problems with it. A point of particular divergence from our regression models, is brought on by the fact that spatial data is often autocorrelated in the random component. That means that nearby errors are often correlated, and violates one of our core assumptions of regression (about the error terms being independent from one another). We need to adjust our models to account for this problem.

With spatial data we are often interested in asking questions about things like:

R has a whole suite of tools for dealing with spatial data. And the visualisation capabilities in particular are very nicely set up. Let’s spend a little bit of time looking at mapping before we move onto generating some statistical models for our data.

If you really want to find out about how to deal with spatial data in R, then this is a great site, full of some lovely tutorials on everything you are likely to want to know: https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r

I definitely recommend this site! You will need to create an account and log in, but it is definitely worth your while. It has very nice workbooks on Point mapping, kernel smoothing, spatial autocorrelation… almost everything you are likely to know. I have taken a lot of what follows from that site, but this workbook will only be scratching the surface of what is there! (Albeit in one relatively quick to work through workbook as opposed to the 12 Practicals that you will find in that site!)

Mapping

When we are dealing with spatial data, we can often end up with a number of different types of data and these can be stored in quite complex datatypes. For example, maps often include:

The sp package is a standard way of dealing with spatial data (stored as objects of type Spatial Polygons). Many packages depend upon it, so it is worth learning about if you are planning to do anything with spatial data.

Often we will get information about spatial objects (i.e. maps) from what is called an OGR data source, which is something that has been constructed according to the Geospatial Data Abstraction Library (GDAL), which is really just a data standard for geospatial data (https://en.wikipedia.org/wiki/GDAL).

R has some pretty good tools for dealing with this stuff. In particular, the readOGR() function in the rgdal package, can be used to read in OGR data. Let’s load it:

library(sp)
library(rgdal)
rgdal: version: 1.2-12, (SVN revision 681)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
 Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
 Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
 Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
 Linking to sp version: 1.2-5 

We can use readOGR to load up the OGR files that have been made available here: https://data.cdrc.ac.uk/dataset/aa5491c9-cbac-4026-97c9-f9168462f4ac/resource/f8ecc2bc-5313-468d-9653-11f0fd752a7d/download/camdenoa11.zip

Make sure that you download them now, and move them to the directory where you are running your code! (i.e. the pathname might not work!) Have a look at the files - they should look like this:

R knows how to load them all if you specify the path name (“Cambden” for me - I have them in a folder) and the name of the file (i.e. the bit out the front without the filename extension.)

Output.Areas<- readOGR("Camden","Camden_oa11")
OGR data source with driver: ESRI Shapefile 
Source: "Camden", layer: "Camden_oa11"
with 749 features
It has 1 fields

Remember to run ?readOGR to find out more about this function (and the OGR format).

You could look at the structure this map using str(Output.Areas) but its REALLY long! I do recommend you examine the structure of this beasty at least once to get a feeling for what is going on in it. Also - beware - the actual data object itself is even longer! (It takes a lot of information to make a map!) Let’s just look at the head of this data for now

head(Output.Areas)

Hmmmm… doesn’t tell us much ;) you had better explore it in your console (but I am not going to do that here because it will make us all do too much scrolling to get past it).

The weird structure you get occurs because sp objects are stored as S4 objects in R. This is a specific object oriented data format, which is quite complex. You can find out more about all of this in this data camp course https://www.datacamp.com/courses/working-with-geospatial-data-in-r (which is quite good! It is definitely worth your while learning about S4 objects in R, and this is a good way to do it.)

Ok… this thing is of class “SpatialPolygonsDataFrame”. That is a complex beast! (Try running ?SpatialPolygonsDataFrame to find out a bit more about them.) A cool thing about these objects, is that they have a nice handy plot method:

plot(Output.Areas)

Its a map of Cambden! (I guess we should have expected that given that the files are called Cambden_* :| - have a look here if you don’t believe me: https://en.wikipedia.org/wiki/London_Borough_of_Camden).

If you want to find some Australian GIS data then you could try these sites:
- http://www.naturalearthdata.com/downloads/
- http://openstreetmapdata.com/data

Ok… so we have our map. What can we do with it? Well… if we had some data then we could join it to our map… Then we could do lots of cool things ;)

Luckily the same site makes census data for the UK available in an easy to use form. You can download it here: https://data.cdrc.ac.uk/dataset/aa5491c9-cbac-4026-97c9-f9168462f4ac/resource/27949d58-8d9b-43b0-b0dd-dfbe47609cbf/download/practicaldata.csv (don’t forget to put it in the right place so that R can find it):

Census.Data <-read.csv("practicaldata.csv")

(Have a look a the data! What is in there?)

Ok. How do we join it? We use a merge function that comes with the sp package:

OA.Census <- merge(Output.Areas, Census.Data, by.x="OA11CD", by.y="OA")

(Make sure you look up what I did there using the helpfile… in particular - can you work out what is going on with those by.x and by.y options?)

Ok… no errors so it looks like the data merged properly. Now what do we have?

head(OA.Census)
An object of class "SpatialPolygonsDataFrame"
Slot "data":
       OA11CD White_British Low_Occupancy Unemployed Qualification
397 E00004527      48.29060     12.745098   7.511737      35.80786
395 E00004525      40.94488     16.806723   5.990783      42.41071
392 E00004522      44.16244      8.547009   2.116402      56.47668
160 E00004287      31.86813     12.612613   3.286385      67.85714
80  E00004206      56.45161     19.685039   7.983193      31.74603
74  E00004200      39.91935     11.764706   3.524229      57.20339

Slot "polygons":
[[1]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] 530490.9 181273.3

Slot "area":
[1] 44544.86

Slot "hole":
[1] FALSE

Slot "ringDir":
[1] 1

Slot "coords":
           [,1]     [,2]
  [1,] 530648.4 181230.2
  [2,] 530648.0 181228.8
  [3,] 530646.6 181223.8
  [4,] 530646.0 181222.2
  [5,] 530643.1 181214.4
  [6,] 530642.2 181212.1
  [7,] 530640.7 181208.2
  [8,] 530639.9 181206.3
  [9,] 530637.0 181204.3
 [10,] 530635.6 181203.4
 [11,] 530634.9 181202.9
 [12,] 530631.9 181201.1
 [13,] 530631.2 181200.6
 [14,] 530609.2 181190.4
 [15,] 530601.3 181187.4
 [16,] 530601.2 181187.4
 [17,] 530596.8 181185.8
 [18,] 530584.2 181181.1
 [19,] 530567.7 181183.0
 [20,] 530564.7 181183.3
 [21,] 530564.1 181183.4
 [22,] 530559.2 181184.0
 [23,] 530556.0 181184.7
 [24,] 530540.7 181188.1
 [25,] 530540.5 181188.2
 [26,] 530519.1 181193.0
 [27,] 530490.9 181199.3
 [28,] 530484.5 181200.7
 [29,] 530483.7 181200.9
 [30,] 530469.9 181204.0
 [31,] 530468.5 181204.3
 [32,] 530466.4 181204.8
 [33,] 530460.4 181206.2
 [34,] 530454.8 181207.4
 [35,] 530447.6 181209.0
 [36,] 530422.1 181214.8
 [37,] 530421.8 181214.8
 [38,] 530421.5 181214.9
 [39,] 530420.9 181213.9
 [40,] 530420.3 181213.0
 [41,] 530416.6 181207.0
 [42,] 530412.9 181200.9
 [43,] 530407.6 181192.3
 [44,] 530406.5 181190.5
 [45,] 530406.2 181190.0
 [46,] 530404.3 181191.8
 [47,] 530400.9 181195.1
 [48,] 530399.9 181196.1
 [49,] 530398.2 181197.7
 [50,] 530390.8 181204.7
 [51,] 530382.5 181212.7
 [52,] 530381.3 181213.9
 [53,] 530378.1 181216.9
 [54,] 530370.0 181224.7
 [55,] 530356.3 181237.7
 [56,] 530354.7 181239.3
 [57,] 530347.6 181246.0
 [58,] 530347.3 181245.7
 [59,] 530328.3 181224.8
 [60,] 530324.0 181220.1
 [61,] 530320.0 181215.8
 [62,] 530319.5 181215.2
 [63,] 530310.2 181205.0
 [64,] 530308.7 181203.3
 [65,] 530307.0 181205.0
 [66,] 530304.0 181204.0
 [67,] 530300.8 181200.7
 [68,] 530287.7 181208.2
 [69,] 530276.8 181223.2
 [70,] 530276.7 181223.3
 [71,] 530276.8 181224.2
 [72,] 530276.8 181224.2
 [73,] 530280.5 181230.4
 [74,] 530292.0 181241.4
 [75,] 530303.0 181251.9
 [76,] 530306.6 181255.3
 [77,] 530311.5 181260.0
 [78,] 530311.5 181260.1
 [79,] 530323.1 181271.1
 [80,] 530324.0 181272.0
 [81,] 530319.0 181277.0
 [82,] 530318.2 181277.8
 [83,] 530317.1 181277.7
 [84,] 530315.0 181280.0
 [85,] 530315.2 181280.2
 [86,] 530315.5 181280.5
 [87,] 530313.0 181283.0
 [88,] 530321.0 181291.4
 [89,] 530326.5 181297.2
 [90,] 530327.9 181298.7
 [91,] 530344.8 181316.5
 [92,] 530346.0 181317.6
 [93,] 530349.1 181321.0
 [94,] 530352.0 181324.0
 [95,] 530356.9 181328.4
 [96,] 530359.6 181330.8
 [97,] 530363.3 181336.6
 [98,] 530365.3 181335.5
 [99,] 530368.1 181333.8
[100,] 530366.0 181332.0
[101,] 530364.6 181330.6
[102,] 530350.4 181316.1
[103,] 530349.6 181315.3
[104,] 530349.7 181314.9
[105,] 530350.8 181306.2
[106,] 530371.9 181294.7
[107,] 530371.9 181294.7
[108,] 530377.8 181270.7
[109,] 530378.2 181269.2
[110,] 530377.9 181268.2
[111,] 530376.3 181260.9
[112,] 530382.1 181258.2
[113,] 530383.1 181256.8
[114,] 530383.9 181258.0
[115,] 530386.8 181262.3
[116,] 530397.0 181277.0
[117,] 530399.4 181280.4
[118,] 530405.7 181289.3
[119,] 530406.2 181290.0
[120,] 530409.2 181294.2
[121,] 530414.0 181301.0
[122,] 530425.1 181315.8
[123,] 530427.3 181318.8
[124,] 530427.0 181325.9
[125,] 530419.1 181330.0
[126,] 530416.5 181331.4
[127,] 530416.3 181331.5
[128,] 530411.7 181372.3
[129,] 530411.7 181372.6
[130,] 530412.5 181374.2
[131,] 530412.7 181374.3
[132,] 530413.3 181373.7
[133,] 530413.9 181374.2
[134,] 530460.0 181364.5
[135,] 530460.2 181364.2
[136,] 530465.8 181365.2
[137,] 530466.8 181365.4
[138,] 530475.0 181374.0
[139,] 530475.0 181374.0
[140,] 530475.4 181373.5
[141,] 530477.3 181370.9
[142,] 530478.0 181370.0
[143,] 530478.1 181370.1
[144,] 530478.3 181370.2
[145,] 530484.1 181374.8
[146,] 530491.3 181380.4
[147,] 530488.1 181374.8
[148,] 530489.0 181374.1
[149,] 530493.5 181370.7
[150,] 530521.2 181350.1
[151,] 530523.0 181352.0
[152,] 530564.6 181389.7
[153,] 530565.0 181390.0
[154,] 530568.0 181393.0
[155,] 530569.2 181393.5
[156,] 530569.7 181393.7
[157,] 530573.0 181395.0
[158,] 530576.0 181388.8
[159,] 530585.0 181370.0
[160,] 530587.3 181364.9
[161,] 530591.6 181357.9
[162,] 530591.0 181356.8
[163,] 530601.4 181334.0
[164,] 530607.3 181321.0
[165,] 530610.6 181313.6
[166,] 530612.7 181309.1
[167,] 530613.4 181307.6
[168,] 530645.0 181238.0
[169,] 530648.4 181230.2



Slot "plotOrder":
[1] 1

Slot "labpt":
[1] 530490.9 181273.3

Slot "ID":
[1] "0"

Slot "area":
[1] 44544.86


[[2]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] 530503.0 181445.9

Slot "area":
[1] 24538.26

Slot "hole":
[1] FALSE

Slot "ringDir":
[1] 1

Slot "coords":
           [,1]     [,2]
  [1,] 530511.3 181531.2
  [2,] 530511.3 181531.1
  [3,] 530512.0 181528.9
  [4,] 530512.2 181528.1
  [5,] 530511.1 181527.7
  [6,] 530517.1 181509.1
  [7,] 530517.2 181508.6
  [8,] 530518.3 181505.1
  [9,] 530520.0 181500.0
 [10,] 530520.7 181498.6
 [11,] 530523.0 181494.0
 [12,] 530527.0 181492.0
 [13,] 530530.0 181489.0
 [14,] 530531.3 181486.2
 [15,] 530535.0 181478.0
 [16,] 530536.6 181474.7
 [17,] 530546.3 181454.2
 [18,] 530554.8 181436.3
 [19,] 530578.9 181456.1
 [20,] 530581.0 181457.9
 [21,] 530599.1 181464.2
 [22,] 530599.6 181464.3
 [23,] 530615.1 181455.4
 [24,] 530624.5 181450.0
 [25,] 530625.0 181449.0
 [26,] 530625.7 181449.3
 [27,] 530652.0 181460.0
 [28,] 530660.2 181463.7
 [29,] 530659.3 181430.0
 [30,] 530660.2 181427.4
 [31,] 530641.0 181420.0
 [32,] 530638.0 181423.0
 [33,] 530636.0 181423.0
 [34,] 530635.0 181423.0
 [35,] 530631.0 181423.0
 [36,] 530621.1 181418.1
 [37,] 530617.6 181416.3
 [38,] 530595.0 181405.0
 [39,] 530574.5 181395.7
 [40,] 530573.0 181395.0
 [41,] 530569.7 181393.7
 [42,] 530569.2 181393.5
 [43,] 530568.0 181393.0
 [44,] 530565.0 181390.0
 [45,] 530564.6 181389.7
 [46,] 530523.0 181352.0
 [47,] 530521.2 181350.1
 [48,] 530493.5 181370.7
 [49,] 530489.0 181374.1
 [50,] 530488.1 181374.8
 [51,] 530491.3 181380.4
 [52,] 530484.1 181374.8
 [53,] 530478.3 181370.2
 [54,] 530478.1 181370.1
 [55,] 530478.0 181370.0
 [56,] 530477.3 181370.9
 [57,] 530475.4 181373.5
 [58,] 530475.0 181374.0
 [59,] 530475.0 181374.0
 [60,] 530466.8 181365.4
 [61,] 530465.8 181365.2
 [62,] 530460.2 181364.2
 [63,] 530460.0 181364.5
 [64,] 530413.9 181374.2
 [65,] 530423.6 181382.8
 [66,] 530430.1 181388.6
 [67,] 530436.7 181394.3
 [68,] 530450.8 181406.8
 [69,] 530450.0 181408.0
 [70,] 530448.0 181411.0
 [71,] 530445.0 181417.0
 [72,] 530443.7 181420.2
 [73,] 530437.8 181435.2
 [74,] 530430.7 181453.0
 [75,] 530422.8 181472.9
 [76,] 530419.6 181480.8
 [77,] 530413.6 181496.1
 [78,] 530412.7 181498.2
 [79,] 530412.0 181500.0
 [80,] 530410.6 181501.4
 [81,] 530410.2 181501.8
 [82,] 530410.0 181502.0
 [83,] 530402.0 181500.0
 [84,] 530380.2 181492.4
 [85,] 530378.1 181495.4
 [86,] 530384.2 181518.4
 [87,] 530390.5 181527.1
 [88,] 530392.7 181529.4
 [89,] 530395.6 181532.4
 [90,] 530399.8 181536.6
 [91,] 530420.7 181558.0
 [92,] 530421.0 181558.4
 [93,] 530424.7 181553.2
 [94,] 530447.1 181521.9
 [95,] 530447.1 181521.8
 [96,] 530452.0 181515.0
 [97,] 530454.0 181516.0
 [98,] 530461.1 181517.5
 [99,] 530469.1 181519.3
[100,] 530480.1 181521.7
[101,] 530484.0 181522.5
[102,] 530488.0 181523.4
[103,] 530500.0 181526.0
[104,] 530505.7 181529.5
[105,] 530508.0 181531.0
[106,] 530510.0 181531.0
[107,] 530511.0 181531.1
[108,] 530511.3 181531.2



Slot "plotOrder":
[1] 1

Slot "labpt":
[1] 530503.0 181445.9

Slot "ID":
[1] "1"

Slot "area":
[1] 24538.26


[[3]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] 530039.1 181373.1

Slot "area":
[1] 47972.09

Slot "hole":
[1] FALSE

Slot "ringDir":
[1] 1

Slot "coords":
           [,1]     [,2]
  [1,] 530207.0 181434.0
  [2,] 530209.6 181427.9
  [3,] 530219.1 181405.3
  [4,] 530227.1 181386.5
  [5,] 530231.5 181375.9
  [6,] 530234.0 181370.0
  [7,] 530234.5 181369.5
  [8,] 530234.0 181369.0
  [9,] 530230.1 181367.1
 [10,] 530230.0 181367.0
 [11,] 530222.2 181362.4
 [12,] 530218.0 181360.0
 [13,] 530196.8 181350.7
 [14,] 530193.0 181349.0
 [15,] 530191.9 181348.6
 [16,] 530188.2 181347.0
 [17,] 530178.6 181343.1
 [18,] 530166.0 181338.0
 [19,] 530149.2 181331.3
 [20,] 530136.7 181326.3
 [21,] 530126.0 181322.0
 [22,] 530123.2 181321.0
 [23,] 530112.0 181317.0
 [24,] 530107.6 181315.6
 [25,] 530092.7 181311.0
 [26,] 530087.6 181309.4
 [27,] 530080.0 181307.0
 [28,] 530073.0 181305.0
 [29,] 530065.0 181302.9
 [30,] 530054.5 181300.2
 [31,] 530034.0 181295.0
 [32,] 530031.0 181294.0
 [33,] 530030.5 181291.0
 [34,] 530029.4 181285.3
 [35,] 530027.4 181274.3
 [36,] 530016.0 181276.9
 [37,] 530012.2 181272.6
 [38,] 530012.0 181272.6
 [39,] 530012.0 181272.6
 [40,] 529971.0 181238.8
 [41,] 529956.3 181243.1
 [42,] 529957.7 181268.6
 [43,] 529958.1 181275.0
 [44,] 529965.6 181289.2
 [45,] 529960.0 181289.0
 [46,] 529958.1 181289.0
 [47,] 529953.0 181289.0
 [48,] 529948.0 181290.0
 [49,] 529940.0 181292.0
 [50,] 529936.0 181288.8
 [51,] 529935.7 181288.6
 [52,] 529920.9 181276.9
 [53,] 529911.8 181269.7
 [54,] 529906.1 181265.2
 [55,] 529893.4 181255.2
 [56,] 529890.3 181260.8
 [57,] 529885.9 181268.7
 [58,] 529880.3 181281.6
 [59,] 529864.3 181318.3
 [60,] 529863.2 181320.8
 [61,] 529862.1 181323.2
 [62,] 529860.9 181326.0
 [63,] 529870.4 181328.4
 [64,] 529877.0 181330.0
 [65,] 529882.0 181333.0
 [66,] 529883.0 181334.0
 [67,] 529891.0 181345.0
 [68,] 529891.2 181345.6
 [69,] 529892.0 181348.0
 [70,] 529889.0 181372.0
 [71,] 529887.0 181386.0
 [72,] 529896.0 181388.0
 [73,] 529895.5 181389.7
 [74,] 529894.4 181393.6
 [75,] 529903.4 181396.0
 [76,] 529904.0 181396.2
 [77,] 529908.6 181397.5
 [78,] 529908.6 181397.5
 [79,] 529922.0 181401.1
 [80,] 529923.3 181401.5
 [81,] 529931.8 181403.9
 [82,] 529932.8 181404.1
 [83,] 529940.7 181406.3
 [84,] 529959.8 181411.6
 [85,] 529974.0 181415.5
 [86,] 529984.9 181418.5
 [87,] 529991.6 181420.3
 [88,] 529994.1 181421.0
 [89,] 530016.7 181427.2
 [90,] 530016.9 181427.3
 [91,] 530022.7 181428.8
 [92,] 530030.1 181430.9
 [93,] 530031.8 181431.4
 [94,] 530033.3 181431.8
 [95,] 530041.4 181434.0
 [96,] 530046.4 181435.3
 [97,] 530061.7 181439.5
 [98,] 530049.3 181460.3
 [99,] 530049.2 181460.4
[100,] 530049.1 181460.6
[101,] 530045.8 181466.2
[102,] 530044.1 181469.4
[103,] 530043.6 181470.4
[104,] 530042.6 181472.4
[105,] 530036.6 181483.7
[106,] 530035.3 181486.2
[107,] 530035.2 181486.4
[108,] 530034.7 181487.3
[109,] 530028.3 181499.4
[110,] 530027.7 181500.5
[111,] 530021.5 181512.3
[112,] 530021.1 181513.2
[113,] 530014.6 181525.6
[114,] 530011.3 181531.8
[115,] 530009.5 181535.2
[116,] 530010.3 181535.7
[117,] 530013.0 181537.2
[118,] 530019.8 181540.9
[119,] 530021.4 181541.6
[120,] 530026.6 181543.7
[121,] 530034.8 181547.0
[122,] 530035.7 181547.4
[123,] 530041.9 181549.9
[124,] 530044.7 181551.0
[125,] 530052.3 181554.1
[126,] 530052.6 181553.2
[127,] 530056.4 181543.9
[128,] 530065.7 181538.0
[129,] 530066.7 181537.3
[130,] 530069.3 181535.7
[131,] 530069.4 181535.0
[132,] 530070.1 181531.6
[133,] 530070.6 181530.8
[134,] 530074.5 181524.5
[135,] 530076.2 181521.8
[136,] 530075.8 181515.8
[137,] 530075.3 181508.0
[138,] 530075.1 181505.0
[139,] 530068.0 181501.0
[140,] 530063.0 181498.7
[141,] 530063.1 181498.3
[142,] 530065.4 181493.3
[143,] 530067.1 181489.3
[144,] 530067.0 181487.9
[145,] 530066.8 181486.3
[146,] 530067.6 181486.6
[147,] 530082.8 181492.1
[148,] 530084.1 181490.1
[149,] 530086.4 181486.3
[150,] 530089.1 181482.1
[151,] 530096.0 181471.0
[152,] 530096.2 181470.7
[153,] 530099.1 181465.6
[154,] 530105.8 181453.8
[155,] 530108.0 181450.0
[156,] 530109.5 181450.3
[157,] 530118.1 181452.1
[158,] 530127.7 181454.1
[159,] 530130.6 181454.7
[160,] 530132.0 181455.0
[161,] 530132.7 181455.2
[162,] 530136.0 181456.0
[163,] 530136.7 181454.8
[164,] 530137.4 181453.8
[165,] 530138.4 181452.1
[166,] 530138.7 181451.7
[167,] 530146.7 181439.1
[168,] 530160.0 181418.0
[169,] 530160.3 181417.6
[170,] 530163.0 181414.4
[171,] 530165.0 181412.0
[172,] 530166.0 181410.0
[173,] 530167.4 181410.0
[174,] 530171.3 181413.6
[175,] 530173.2 181413.3
[176,] 530174.0 181413.2
[177,] 530207.0 181434.0



Slot "plotOrder":
[1] 1

Slot "labpt":
[1] 530039.1 181373.1

Slot "ID":
[1] "2"

Slot "area":
[1] 47972.09


[[4]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] 524404.7 185053.0

Slot "area":
[1] 9762.198

Slot "hole":
[1] FALSE

Slot "ringDir":
[1] 1

Slot "coords":
          [,1]     [,2]
 [1,] 524355.4 185053.6
 [2,] 524346.8 185064.2
 [3,] 524345.7 185065.5
 [4,] 524345.7 185065.5
 [5,] 524326.0 185087.7
 [6,] 524326.0 185087.8
 [7,] 524342.8 185107.2
 [8,] 524355.4 185108.2
 [9,] 524370.5 185109.4
[10,] 524388.5 185110.8
[11,] 524393.2 185111.2
[12,] 524396.5 185107.5
[13,] 524403.4 185097.4
[14,] 524407.2 185089.3
[15,] 524409.2 185083.7
[16,] 524410.3 185077.7
[17,] 524411.0 185077.2
[18,] 524413.6 185075.3
[19,] 524414.3 185074.6
[20,] 524419.9 185069.5
[21,] 524425.3 185063.0
[22,] 524427.1 185060.2
[23,] 524430.6 185054.8
[24,] 524473.7 185086.8
[25,] 524474.2 185087.2
[26,] 524480.0 185079.4
[27,] 524480.4 185078.9
[28,] 524489.4 185066.8
[29,] 524488.4 185066.0
[30,] 524487.8 185066.9
[31,] 524478.0 185061.0
[32,] 524449.4 185040.8
[33,] 524450.7 185040.1
[34,] 524451.2 185039.9
[35,] 524452.2 185038.6
[36,] 524461.9 185026.6
[37,] 524465.6 185020.5
[38,] 524468.8 185010.6
[39,] 524411.0 184986.3
[40,] 524410.0 184985.9
[41,] 524409.9 184985.9
[42,] 524407.8 184988.4
[43,] 524399.0 184999.1
[44,] 524399.0 184999.2
[45,] 524398.3 185000.0
[46,] 524394.2 185007.4
[47,] 524392.1 185009.9
[48,] 524388.7 185013.7
[49,] 524388.1 185014.4
[50,] 524376.2 185027.9
[51,] 524373.7 185031.0
[52,] 524370.5 185034.9
[53,] 524357.8 185050.6
[54,] 524357.8 185050.7
[55,] 524355.4 185053.6



Slot "plotOrder":
[1] 1

Slot "labpt":
[1] 524404.7 185053.0

Slot "ID":
[1] "3"

Slot "area":
[1] 9762.198


[[5]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] 528694.5 184499.2

Slot "area":
[1] 13799.61

Slot "hole":
[1] FALSE

Slot "ringDir":
[1] 1

Slot "coords":
           [,1]     [,2]
  [1,] 528718.5 184565.0
  [2,] 528721.1 184565.0
  [3,] 528723.9 184564.9
  [4,] 528724.0 184564.9
  [5,] 528732.1 184564.7
  [6,] 528732.5 184564.7
  [7,] 528738.5 184564.5
  [8,] 528739.4 184564.5
  [9,] 528745.0 184564.4
 [10,] 528745.3 184564.3
 [11,] 528751.2 184564.2
 [12,] 528751.5 184564.2
 [13,] 528758.0 184564.0
 [14,] 528759.0 184564.0
 [15,] 528769.8 184563.7
 [16,] 528770.5 184563.7
 [17,] 528784.5 184563.3
 [18,] 528783.2 184562.2
 [19,] 528780.2 184559.6
 [20,] 528755.0 184537.8
 [21,] 528753.5 184536.9
 [22,] 528747.6 184533.4
 [23,] 528746.5 184533.1
 [24,] 528740.5 184531.5
 [25,] 528739.5 184531.2
 [26,] 528722.3 184532.4
 [27,] 528721.7 184531.8
 [28,] 528709.8 184518.6
 [29,] 528710.1 184518.2
 [30,] 528711.0 184517.1
 [31,] 528711.1 184516.5
 [32,] 528712.3 184504.6
 [33,] 528713.2 184494.7
 [34,] 528713.2 184494.6
 [35,] 528712.3 184491.9
 [36,] 528711.3 184490.6
 [37,] 528707.7 184485.6
 [38,] 528707.1 184484.9
 [39,] 528705.5 184483.5
 [40,] 528703.7 184481.4
 [41,] 528703.5 184481.1
 [42,] 528703.3 184480.9
 [43,] 528701.7 184480.1
 [44,] 528701.5 184480.0
 [45,] 528701.4 184479.6
 [46,] 528701.4 184479.5
 [47,] 528700.8 184477.2
 [48,] 528700.7 184476.7
 [49,] 528700.7 184476.7
 [50,] 528697.7 184470.8
 [51,] 528697.7 184469.0
 [52,] 528697.7 184462.3
 [53,] 528701.9 184440.4
 [54,] 528703.9 184430.3
 [55,] 528704.4 184430.0
 [56,] 528708.2 184437.7
 [57,] 528718.8 184459.2
 [58,] 528720.6 184462.7
 [59,] 528721.8 184465.2
 [60,] 528726.6 184474.9
 [61,] 528727.1 184475.9
 [62,] 528728.1 184478.0
 [63,] 528732.3 184486.5
 [64,] 528739.0 184500.0
 [65,] 528739.2 184500.5
 [66,] 528739.5 184501.4
 [67,] 528739.5 184501.0
 [68,] 528739.6 184500.5
 [69,] 528740.0 184496.6
 [70,] 528742.6 184495.0
 [71,] 528742.8 184494.9
 [72,] 528761.0 184477.4
 [73,] 528753.9 184450.1
 [74,] 528753.7 184450.0
 [75,] 528737.0 184440.9
 [76,] 528729.3 184428.2
 [77,] 528724.3 184420.0
 [78,] 528727.8 184404.6
 [79,] 528694.0 184405.0
 [80,] 528680.0 184411.0
 [81,] 528666.0 184418.0
 [82,] 528666.0 184418.2
 [83,] 528664.7 184422.6
 [84,] 528664.4 184423.8
 [85,] 528662.1 184431.6
 [86,] 528662.1 184431.8
 [87,] 528659.1 184444.4
 [88,] 528658.0 184449.2
 [89,] 528657.9 184449.9
 [90,] 528653.7 184468.0
 [91,] 528653.6 184468.2
 [92,] 528653.2 184470.0
 [93,] 528653.2 184470.2
 [94,] 528653.1 184470.5
 [95,] 528652.3 184474.1
 [96,] 528652.2 184474.4
 [97,] 528650.3 184482.8
 [98,] 528650.2 184483.0
 [99,] 528649.3 184486.9
[100,] 528649.1 184487.8
[101,] 528648.5 184490.3
[102,] 528648.4 184490.7
[103,] 528647.3 184495.7
[104,] 528647.1 184496.5
[105,] 528645.0 184505.5
[106,] 528644.8 184506.5
[107,] 528644.6 184507.3
[108,] 528644.5 184507.5
[109,] 528643.9 184510.4
[110,] 528643.5 184512.0
[111,] 528642.1 184518.3
[112,] 528641.8 184519.5
[113,] 528641.5 184520.5
[114,] 528641.4 184521.2
[115,] 528641.3 184521.3
[116,] 528641.0 184523.0
[117,] 528640.5 184524.9
[118,] 528639.2 184530.4
[119,] 528639.0 184531.6
[120,] 528637.7 184537.0
[121,] 528637.6 184537.5
[122,] 528637.0 184540.2
[123,] 528636.8 184540.9
[124,] 528635.5 184546.7
[125,] 528635.4 184546.8
[126,] 528634.2 184552.2
[127,] 528632.4 184561.1
[128,] 528628.9 184578.9
[129,] 528628.9 184578.9
[130,] 528628.7 184580.2
[131,] 528628.4 184581.4
[132,] 528654.1 184574.7
[133,] 528654.9 184574.5
[134,] 528659.3 184573.3
[135,] 528661.6 184572.7
[136,] 528661.7 184572.7
[137,] 528669.4 184570.7
[138,] 528675.7 184569.5
[139,] 528676.0 184569.5
[140,] 528680.2 184568.7
[141,] 528680.5 184568.6
[142,] 528681.0 184568.5
[143,] 528682.3 184568.3
[144,] 528685.5 184567.7
[145,] 528690.4 184566.8
[146,] 528692.6 184566.4
[147,] 528695.2 184565.9
[148,] 528696.8 184565.6
[149,] 528697.8 184565.6
[150,] 528706.8 184565.4
[151,] 528707.1 184565.4
[152,] 528718.5 184565.0



Slot "plotOrder":
[1] 1

Slot "labpt":
[1] 528694.5 184499.2

Slot "ID":
[1] "4"

Slot "area":
[1] 13799.61


[[6]]
An object of class "Polygons"
Slot "Polygons":
[[1]]
An object of class "Polygon"
Slot "labpt":
[1] 529326.7 181750.0

Slot "area":
[1] 16969.09

Slot "hole":
[1] FALSE

Slot "ringDir":
[1] 1

Slot "coords":
           [,1]     [,2]
  [1,] 529332.4 181816.6
  [2,] 529336.9 181814.5
  [3,] 529341.9 181812.1
  [4,] 529343.2 181811.5
  [5,] 529347.8 181805.8
  [6,] 529347.9 181805.6
  [7,] 529348.3 181802.1
  [8,] 529348.8 181797.8
  [9,] 529348.8 181797.8
 [10,] 529347.8 181795.9
 [11,] 529343.8 181788.5
 [12,] 529342.1 181785.2
 [13,] 529348.0 181776.0
 [14,] 529350.7 181770.0
 [15,] 529352.6 181770.2
 [16,] 529352.7 181770.3
 [17,] 529361.4 181772.7
 [18,] 529366.8 181775.0
 [19,] 529371.0 181768.5
 [20,] 529373.2 181765.2
 [21,] 529373.6 181764.4
 [22,] 529369.0 181761.3
 [23,] 529376.3 181750.3
 [24,] 529378.4 181747.1
 [25,] 529380.0 181744.6
 [26,] 529380.2 181740.8
 [27,] 529381.0 181740.1
 [28,] 529382.5 181738.0
 [29,] 529388.7 181740.0
 [30,] 529413.0 181748.0
 [31,] 529416.1 181743.1
 [32,] 529420.1 181736.8
 [33,] 529421.1 181735.3
 [34,] 529421.5 181734.7
 [35,] 529423.1 181732.1
 [36,] 529423.2 181732.0
 [37,] 529430.4 181720.9
 [38,] 529430.0 181720.6
 [39,] 529429.4 181720.1
 [40,] 529412.1 181706.3
 [41,] 529412.0 181706.2
 [42,] 529411.8 181704.9
 [43,] 529413.3 181701.2
 [44,] 529419.6 181685.6
 [45,] 529414.9 181680.0
 [46,] 529415.4 181679.6
 [47,] 529415.5 181679.5
 [48,] 529412.8 181677.7
 [49,] 529411.5 181676.9
 [50,] 529410.0 181676.0
 [51,] 529400.9 181689.6
 [52,] 529398.0 181694.0
 [53,] 529397.0 181695.0
 [54,] 529396.0 181695.5
 [55,] 529395.0 181696.0
 [56,] 529393.4 181696.0
 [57,] 529392.4 181696.0
 [58,] 529391.2 181696.8
 [59,] 529390.5 181697.0
 [60,] 529386.2 181695.7
 [61,] 529383.8 181694.9
 [62,] 529383.7 181694.2
 [63,] 529383.4 181692.4
 [64,] 529382.8 181688.6
 [65,] 529382.1 181687.8
 [66,] 529372.9 181677.4
 [67,] 529376.1 181669.4
 [68,] 529387.0 181660.6
 [69,] 529387.0 181660.6
 [70,] 529385.5 181659.6
 [71,] 529384.2 181658.8
 [72,] 529383.4 181658.3
 [73,] 529380.9 181656.7
 [74,] 529379.5 181655.9
 [75,] 529368.7 181649.1
 [76,] 529366.9 181648.0
 [77,] 529359.3 181643.3
 [78,] 529358.5 181642.7
 [79,] 529357.9 181642.4
 [80,] 529355.4 181640.8
 [81,] 529355.1 181641.2
 [82,] 529342.4 181659.1
 [83,] 529342.4 181659.2
 [84,] 529336.8 181667.1
 [85,] 529336.7 181667.1
 [86,] 529329.9 181676.8
 [87,] 529329.9 181676.8
 [88,] 529328.5 181678.8
 [89,] 529325.0 181683.7
 [90,] 529325.0 181683.7
 [91,] 529323.0 181686.5
 [92,] 529317.2 181694.7
 [93,] 529315.2 181697.5
 [94,] 529315.0 181697.8
 [95,] 529310.1 181704.7
 [96,] 529293.3 181728.5
 [97,] 529284.2 181741.3
 [98,] 529283.3 181742.5
 [99,] 529281.2 181747.0
[100,] 529279.6 181750.3
[101,] 529278.5 181752.7
[102,] 529278.1 181753.6
[103,] 529273.4 181763.4
[104,] 529271.7 181765.9
[105,] 529270.8 181767.4
[106,] 529267.2 181771.8
[107,] 529266.3 181772.7
[108,] 529261.4 181777.8
[109,] 529261.0 181778.1
[110,] 529260.7 181778.5
[111,] 529254.1 181786.2
[112,] 529246.6 181795.0
[113,] 529238.9 181803.9
[114,] 529235.2 181808.1
[115,] 529232.9 181810.8
[116,] 529230.4 181814.1
[117,] 529230.2 181814.3
[118,] 529225.1 181820.9
[119,] 529223.4 181823.2
[120,] 529222.2 181824.7
[121,] 529219.2 181828.5
[122,] 529217.1 181831.2
[123,] 529217.0 181831.4
[124,] 529211.6 181838.3
[125,] 529211.4 181838.5
[126,] 529205.1 181846.7
[127,] 529205.2 181846.7
[128,] 529208.4 181847.3
[129,] 529213.8 181848.3
[130,] 529229.4 181851.2
[131,] 529306.3 181812.7
[132,] 529307.0 181812.3
[133,] 529310.7 181814.7
[134,] 529332.4 181816.6



Slot "plotOrder":
[1] 1

Slot "labpt":
[1] 529326.7 181750.0

Slot "ID":
[1] "5"

Slot "area":
[1] 16969.09



Slot "plotOrder":
[1] 3 1 2 6 5 4

Slot "bbox":
       min      max
x 524326.0 530660.2
y 181181.1 185111.2

Slot "proj4string":
CRS arguments:
 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
+y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy
+towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894 

Oooo… it is one of those SPdataframe thingos with 5 variables now (try looking at OA.Census and Output.Areas in your Environment tab on the right to work out what is different between the two sp objects - what has happened?)

Basically… we have added some data to our map! 5 new variables are now in there about the number of White_Brittish citizens there are in an area, the Low_Occupancy info, how many Unemployed, and their Qualification. Go and look at the site we got the data from to find out more about this dataset (Practical 1 has information about where the data comes from: https://data.cdrc.ac.uk/dataset/aa5491c9-cbac-4026-97c9-f9168462f4ac/resource/09a1093a-74b9-487b-a939-d928f27be612/download/practical1.html).

Now. Many different coordinate systems are possible. If you are having trouble using GIS data this could well be what is going wrong. Here, we are going to set the coordinate system to the British National Grid (which is the format the data actually came in - we are just making sure, and it is an important thing to learn about):

proj4string(OA.Census) <- CRS("+init=EPSG:27700")
A new CRS was assigned to an object with an existing CRS:
+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894
without reprojecting.
For reprojection, use function spTransform

Cool. A really good place to go if you want to learn more about mapping (as well as some of the R S4 classes like SpatialPolygonsDataFrame and how they work!) is this data camp course: https://campus.datacamp.com/courses/working-with-geospatial-data-in-r. Chapters 2 onwards are very nice, and provide a good introduction to the sp, raster and rgdal packages, setting coordinate systems etc. I strongly recommend working through this course if you want to find out more about what we are doing here (we are skimming through!)

Now… we could make our map look nicer. Let’s use the tmap package, which works quite a bit like ggplot (i.e. it layers the graphics up, so is quite powerful).

library(tmap)

Have a look at ?tm_shape and ?tm_fill and you will start working out how it can be used. For a list of all possible functions try ?tmap. That Data Camp course I mentioned above (https://campus.datacamp.com/courses/working-with-geospatial-data-in-r) is the place to go if you want to find out more about how tmap works. We don’t have time to cover it here.

Let’s try using it! What is the spread of Qualifications acorss Camden?

tm_shape(OA.Census) + tm_fill("Qualification") 

Nice - you can find out a lot more about mapping options by working through the rest of Practical 5 on https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r (we are going to move onto more statistical things!)

More information

We are not going to cover mapping in any more detail - you could easily complete a whole course just in this! For more information you could start with the following two resources (which I have already referred to above!)

Point Pattern Analysis

So… we have a map. Let’s put some more data onto it. The same site makes some point data about house prices available via this link: https://data.cdrc.ac.uk/dataset/aa5491c9-cbac-4026-97c9-f9168462f4ac/resource/24c4f527-43c8-4c09-8558-ad6a48e17de5/download/camdenhousesales15.csv

Let’s look at it (make sure you ahve downloaded the data from the site! You will need to log in to access it which is why we are not bothering to read it directly):

houses <- read.csv("CamdenHouseSales15.csv")
houses

Right-o. We have seen things like this before. Let’s quickly plot out what looks like the lattitude/longitude data:

plot(houses$oseast1m, houses$osnrth1m)

Yep - it also looks like Cambden! We can probably just map the two datasets together. If we turn this object (which is just a dataframe) into a SpatialPointsDataFrame (like OA.Census) then we will be able to layer them on one another! We need to be a bit careful about that coordinate frame thing again though (i.e. we need to set it again).

So. We need to set what the data is to be included (houses), what columns contain the x and y coordinates (houses[,3:4]), and what projection system we are using (CRS("+init=EPSG:27700")).

House.Points <-SpatialPointsDataFrame(houses[,8:9], houses, proj4string = CRS("+init=EPSG:27700"))

Let’s print it just to be sure:

House.Points
class       : SpatialPointsDataFrame 
features    : 2547 
extent      : 524048, 531511, 181067, 187470  (xmin, xmax, ymin, ymax)
coord. ref. : +init=EPSG:27700 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894 
variables   : 9
names       :    UID,    Price,            Date,     Street,  District,         Region, Postcode, oseast1m, osnrth1m 
min values  : 284509,      400, 01/02/2015 0:00, ABBEY ROAD,    CAMDEN, GREATER LONDON,  EC1M3HA,   524048,   181067 
max values  : 927842, 22500000, 12/31/2015 0:00,   YORK WAY, ISLINGTON, GREATER LONDON,  WC2H9RG,   531511,   187470 
# creates a coloured dot map
tm_shape(OA.Census) + tm_borders(alpha=.4) +
tm_shape(House.Points) + tm_dots(col = "Price", palette = "Reds", style = "quantile") 

Alright! That’s kind of nice! Again. You can keep going, and adding to this map if you check out Practical 6 of https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r, we are going to move onto Kernel smoothing!

Kernel smoothing

A point pattern is really just a collection of geographical points, that are assumed to be generated by a random process. Let’s imagine the series of points in an \((x,y)\) coordinate system, then we can imagine the vector of \(n\) observations: \({\bf{x}_i}=\{(x_1, y_1), (x_2,y_2)\dots, (x_n,y_n)\}\) for some variable of interest.

One way to analyse these type of point distributions is to perform kernel smoothing, where you replace each point \(x_i\) with a kernel, which is just a simple localised function \(f({\bf x_i})\) centred on each of our points. We can then and add up all these kernels to get a smooth intensity function, which describes how the distribution of points behaves over space. You could really just think of this as a “bump averaging process”.

To visualise this process, imagine that you have some blotter paper, and you are putting a dot of ink at each point over your paper. The ink will spread out, with a darker spot in the centre. If you do this for a lot of different dots, then they will start to overlap, getting darker. That is your intensity function!

You need to decide things like what type of kernel function to use, what shape it has etc. but the density function in spatstat has some pretty good ones for you to use if you need something different. (This data camp course gives a good introduction: https://campus.datacamp.com/courses/spatial-statistics-in-r/). Here, we are going to use a more modern package that couples very nicely with our tmap object.

The kernelUD function runs a straightforward estimation of our house price data:

library(adehabitatHR)
Loading required package: deldir
deldir 0.1-14
Loading required package: ade4
Loading required package: adehabitatMA
Loading required package: adehabitatLT
Loading required package: CircStats
Loading required package: MASS
Loading required package: boot
# runs the kernel density estimation,look up the function parameters for more options
kde.output <- kernelUD(House.Points, h="href", grid = 1000)
xy should contain only one column (the id of the animals)
id ignored
plot(kde.output)

Interesting. Can we put it on our map? First we have to convert it to a raster object, and set its coordinate system, so that we can then map it onto our previous map:

library(raster)

Attaching package: ‘raster’

The following objects are masked from ‘package:MASS’:

    area, select

The following object is masked from ‘package:adehabitatMA’:

    buffer
#convert our kernel density map to a raster object
kde <- raster(kde.output)
# set the coordinate projection of our object to British National Grid (the one we have been using all along)
projection(kde) <- CRS("+init=EPSG:27700")
#map it!
tm_shape(kde) + tm_raster("ud")

We can zoom in to match it to our previous map (if you have forgotten what that was then go back up and remind yourself of what the Output.Areas map was!)

masked_kde <- mask(kde, Output.Areas)
plot(masked_kde) # a quick sanity check!

(There is normally a good bet that the plot function will exist for things of these type, so it is worth a try if you don’t know what to do with something… that is pretty much what we did here ;)

Nice! Looks like things are working… Let’s neaten it up a bit with some of the functionality built for tmap!

library(tmaptools) # provides a set of tools for processing spatial data
# creates a bounding box based on the extents of the Output.Areas polygon (to make our map smaller)
bounding_box <- bb(Output.Areas)
# maps the raster within the bounding box
tm_shape(masked_kde, bbox = bounding_box) + tm_raster("ud")

But if we want to get something nicer still we can keep on fiddling… I will leave it for you to look up all the calls in the help!

tm_shape(masked_kde, bbox = bounding_box) + tm_raster("ud", style = "quantile", n = 100, legend.show = FALSE, palette = "YlGnBu") +
tm_shape(Output.Areas) + tm_borders(alpha=.3, col = "white") +
tm_layout(frame = FALSE)

Nice! That tells us a lot about how house prices are changing as we move around in Camden. (Any groups think that kind of thing might be useful? ;)

More information

Spatial Attribute Analysis

Quite often we get data in regions (e.g. LGA, SA4 etc.) instead of as a point function. This is so common that there are a whole range of techniques for dealing with it.

Spatial Autocorrelaton

Spatial autocorrelation is an extension of temporal autocorrelation (which is covered in the workbook on Time Series). It is a bit more complex though, as spatial objects have (at least) two dimensions and, like we have seen above, the regions can have complex shapes. This means that it may not be obvious how to determine what is “near”, and there are lots of different measures of this (we are just going to consider one, but you will find lots of resources about other methods as you work through this section).

A spatial autocorrelation measures how distance influences a particular variable. Basically, we expect things that are close together to be similar. So for example, a contagious disease is more likely to spread to people who live close to each other, or people tend to cluster together in suburbs with people like them. Can we think of a variable that might do this in our dataset? What about qualifications? You might expect that suburbs with a lot of people with degrees will tend to cluster together perhaps…

Let’s go back to our map, and adjust it a bit to see if this might be occurring in Camden.

tm_shape(OA.Census) + tm_fill("Qualification", palette = "Reds", style = "quantile", title = "% with a Qualification") + tm_borders(alpha=.4)  

Looks like this might be happening actually. How could we test whether this is statistically significant though? We will use the spdep package to look at spatial autocorrelation.

library(spdep)
Loading required package: Matrix

Attaching package: ‘spdep’

The following object is masked from ‘package:ade4’:

    mstree

First, we have to find out which polygons in our map are neighbours with one another. We are going to use one particular method here:

neighbours <- poly2nb(OA.Census, queen = FALSE)
neighbours
Neighbour list object:
Number of regions: 749 
Number of nonzero links: 4176 
Percentage nonzero weights: 0.7443837 
Average number of links: 5.575434 

By setting queen = FALSE we have set up our map to use a definition of neighbours where more than one shared point is required for two regions to be defined as neighbouring one another. (Look up more options in ?poly2nb and you can find out a lot more details about this in these slides: http://www.bias-project.org.uk/ASDARcourse/unit6_slides.pdf).

We can plot this list of neighbours on our map

plot(OA.Census, border = 'lightgrey')
plot(neighbours, coordinates(OA.Census), add=TRUE, col='red')

Ok. Now we can run an autoregression. First we have to convert our neighbours object to the right kind of data object, one that has weights:

listw <- nb2listw(neighbours)
listw
Characteristics of weights list object:
Neighbour list object:
Number of regions: 749 
Number of nonzero links: 4176 
Percentage nonzero weights: 0.7443837 
Average number of links: 5.575434 

Weights style: W 
Weights constants summary:
    n     nn  S0       S1       S2
W 749 561001 749 285.3793 3113.982

Compare listw and neighbours. See that our new list has extra information about weights? We have specified what type of coding scheme we are going to use, that is, how neighbours with no links should be treated etc. See the preliminary details at ?nb2listw, but if you want to find out more about this process then you could check out slides 23-26 of this set: http://www.bias-project.org.uk/ASDARcourse/unit6_slides.pdf (Lots of other details in here too!)

Ok. We can now run a test for spatial autocorrelation! We are going to use Moran’s test, which creates a correlation score between -1 and 1. Have a look at ?moran to find out more about Moran’s I works, but basically it measures spatial autocorrelation based on both feature locations and feature values simultaneously. (See the wikipedia page for a bit more info: https://en.wikipedia.org/wiki/Moran%27s_I) Enough already - lets do the test! Are the regions autocorrelated?

moran.test(OA.Census$Qualification, listw)

    Moran I test under randomisation

data:  OA.Census$Qualification  
weights: listw  

Moran I statistic standard deviate = 24.292, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     0.5448699398     -0.0013368984      0.0005055733 

Ok. Our p-value is significant and our I statistic is positive (i.e correlated). So the regions seem to be somewhat correlated, as we expected to see from the red image above where we saw that regions with more qualifications tended to cluster together. This page has more details about interpreting Moran’s I, athough it is for a different package from R: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Spatial_Autocorrelation_Global_Moran_s_I_works/005p0000000t000000/

This is what is known as a global autocorrelation test. We are looking over the whole map to measure spatial correlation.

We can also do a local test for autocorrelation. Let’s start by looking at how the Qualifications are changing against spatially lagged values. If you want to find out more about how this works, then this page is a great place to start: http://rspatial.org/analysis/rst/3-spauto.html

moran <- moran.plot(OA.Census$Qualification, listw = nb2listw(neighbours, style = "W"))

Hmmm… seems like there might be some sort of a positive relationship? There are quite a few outliers, perhaps we can learn a bit more. The localmoran function is what we need (check the help file)

local <- localmoran(x = OA.Census$Qualification, listw = nb2listw(neighbours, style = "W"))
head(local)
           Ii         E.Ii    Var.Ii        Z.Ii  Pr(z > 0)
0  0.04369897 -0.001336898 0.2486902  0.09030862 0.46402099
1  0.05684782 -0.001336898 0.2486902  0.11667547 0.45355861
2 -0.13604966 -0.001336898 0.1415364 -0.35807555 0.63985661
3  0.49387693 -0.001336898 0.3320321  0.85941460 0.19505591
4  0.64934401 -0.001336898 0.1415364  1.72955349 0.04185504
5 -0.28284849 -0.001336898 0.4987159 -0.39862974 0.65491698

Right… so this is producing a whole set of values for each region! We could add this to our shapefile and then we would be able to map it…

# binds results to our polygon shapefile
moran.map <- cbind(OA.Census, local)
# maps the results
tm_shape(moran.map) + tm_fill(col = "Ii", style = "quantile", title = "local moran statistic") 

So this shows us the variations in autocorrelation across space. But are they significant? We could try mapping the p-values the same way. First we need to work out what they are called:

names(moran.map@data)
 [1] "OA11CD"        "White_British" "Low_Occupancy" "Unemployed"   
 [5] "Qualification" "Ii"            "E.Ii"          "Var.Ii"       
 [9] "Z.Ii"          "Pr.z...0."    
# maps the p-values
tm_shape(moran.map) + tm_fill(col = "Pr.z...0.", style = "fixed", breaks=c(0.001,0.01,0.05,0.1,0.2,1), title = "p-values") 

Interesting… Can you interpret this? Is it what you expected? (Make sure you look at both the local moran statistics and the p-values in interpreting your results.)

In practice you should really be comparing this to a randomly generated Markov Chain approach. The Data Camp course on https://campus.datacamp.com/courses/spatial-statistics-in-r/ goes into this in a lot of detail.

More information

Geographically Weighted Regression

Ok… final step! Can we work out how to use areas in a regression model? Yes we can! Geographically Weighted Regression (GWR) can be used to identify how regression coefficients may vary across the study area. (So it relaxes assumptions about stationarity for spatial data.)

Standard linear regression over a geographical dataset

First, let’s try running just a normal regression model, that tries to explain Qualifications in terms of the other variables in the census dataset:

lm.model <- lm(OA.Census$Qualification ~ OA.Census$Unemployed+OA.Census$White_British)
summary(lm.model)

Call:
lm(formula = OA.Census$Qualification ~ OA.Census$Unemployed + 
    OA.Census$White_British)

Residuals:
    Min      1Q  Median      3Q     Max 
-50.311  -8.014   1.006   8.958  38.046 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)             47.86697    2.33574   20.49   <2e-16 ***
OA.Census$Unemployed    -3.29459    0.19027  -17.32   <2e-16 ***
OA.Census$White_British  0.41092    0.04032   10.19   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.69 on 746 degrees of freedom
Multiple R-squared:  0.4645,    Adjusted R-squared:  0.463 
F-statistic: 323.5 on 2 and 746 DF,  p-value: < 2.2e-16

Its not a bad model. Explains 46.5% of the variance, good p-values… what do the residuals look like?

plot(lm.model)

Not brillant, but kind of ok. Is there a spatial distribution to them though? (There should be - we just found it using the spatial autocorrelation stuff!)

resids<-residuals(lm.model)
map.resids <- cbind(OA.Census, resids) 
# we need to rename the column header from the resids file - in this case its the 6th column of map.resids
names(map.resids)[6] <- "resids"
# maps the residuals using the quickmap function from tmap
qtm(map.resids, fill = "resids")

tm_shape(map.resids) + tm_fill("resids") 

Seems like there might be a bit of a spatial dependence there.

GWR

Ok. Can we take account of spatial variables along with our regression model? First we need to work out how to model the kernel. We will use

library("spgwr")
NOTE: This package does not constitute approval of GWR
as a method of spatial analysis; see example(gwr)
#calculate kernel bandwidth
GWRbandwidth <- gwr.sel(OA.Census$Qualification ~ OA.Census$Unemployed+OA.Census$White_British, data=OA.Census,adapt=T)
Adaptive q: 0.381966 CV score: 101420.8 
Adaptive q: 0.618034 CV score: 109723.2 
Adaptive q: 0.236068 CV score: 96876.06 
Adaptive q: 0.145898 CV score: 94192.41 
Adaptive q: 0.09016994 CV score: 91099.75 
Adaptive q: 0.05572809 CV score: 88242.89 
Adaptive q: 0.03444185 CV score: 85633.41 
Adaptive q: 0.02128624 CV score: 83790.04 
Adaptive q: 0.01315562 CV score: 83096.03 
Adaptive q: 0.008130619 CV score: 84177.45 
Adaptive q: 0.01535288 CV score: 83014.34 
Adaptive q: 0.01515437 CV score: 82957.49 
Adaptive q: 0.01436908 CV score: 82857.74 
Adaptive q: 0.01440977 CV score: 82852.4 
Adaptive q: 0.01457859 CV score: 82833.25 
Adaptive q: 0.01479852 CV score: 82855.45 
Adaptive q: 0.01461928 CV score: 82829.32 
Adaptive q: 0.01468774 CV score: 82823.82 
Adaptive q: 0.01473006 CV score: 82835.89 
Adaptive q: 0.01468774 CV score: 82823.82 

Ok. Now we can use this value of the bandwidth in our model (don’t forget to look up the relevant help pages to find out what we are doing here!)

#fit the gwr model (note it has the same formula as before)
gwr.model = gwr(OA.Census$Qualification ~ OA.Census$Unemployed+OA.Census$White_British, data = OA.Census, adapt=GWRbandwidth, hatmatrix=TRUE, se.fit=TRUE) 
#OK. What did it do?
gwr.model
Call:
gwr(formula = OA.Census$Qualification ~ OA.Census$Unemployed + 
    OA.Census$White_British, data = OA.Census, adapt = GWRbandwidth, 
    hatmatrix = TRUE, se.fit = TRUE)
Kernel function: gwr.Gauss 
Adaptive quantile: 0.01468774 (about 11 of 749 data points)
Summary of GWR coefficient estimates at data points:
                            Min.  1st Qu.   Median  3rd Qu.
X.Intercept.            11.08183 34.43427 45.76862 59.75372
OA.Census.Unemployed    -5.45291 -3.28308 -2.55398 -1.79413
OA.Census.White_British -0.28046  0.19955  0.37788  0.53216
                            Max.  Global
X.Intercept.            85.01866 47.8670
OA.Census.Unemployed     0.77019 -3.2946
OA.Census.White_British  0.94678  0.4109
Number of data points: 749 
Effective number of parameters (residual: 2traceS - traceS'S): 132.6449 
Effective degrees of freedom (residual: 2traceS - traceS'S): 616.3551 
Sigma (residual: 2traceS - traceS'S): 9.903539 
Effective number of parameters (model: traceS): 94.44661 
Effective degrees of freedom (model: traceS): 654.5534 
Sigma (model: traceS): 9.610221 
Sigma (ML): 8.983902 
AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): 5633.438 
AIC (GWR p. 96, eq. 4.22): 5508.777 
Residual sum of squares: 60452.16 
Quasi-global R2: 0.7303206 

So… what happened? Let’s compare a few things. First - what are the components of the models?

names(lm.model)
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"        
names(gwr.model)
 [1] "SDF"       "lhat"      "lm"        "results"   "bandwidth"
 [6] "adapt"     "hatmatrix" "gweight"   "gTSS"      "this.call"
[11] "fp.given"  "timings"  

What if we look at the coefficients for the global model (i.e. the standard linear regression fit) again?

lm.model$coefficients
            (Intercept)    OA.Census$Unemployed 
             47.8669683              -3.2945908 
OA.Census$White_British 
              0.4109194 

How do these compare with the ranges of the intercept and the model coefficients in the summary for gwr.model? What has changed by us running a local regression? They are not completely different, but if we compare the Median values then we see that they have changed a little bit.

This is because in our GWR, we have sets of these intercept and coefficient values for each map region (along with a whole heap of information about the error of the fit at that point) . We can look at them easily if we force the part of the model with the fit.points etc. to a data.frame

results <-as.data.frame(gwr.model$SDF)
results

Ha. So the coefficients are changing for each council region, along with the intercepts, and some other bits and pieces.

Just as a by the way… we are using the @ here to refer to a slot in the S4 object that we know is a data.frame (because of the help pages). Here is another way to force out the same information that is probably a bit more familiar:

gwr.model$SDF@data

Exactly the same right? If you want to find out more about S4 functions then you should go and do Chapter 2 of this Data Camp course: https://campus.datacamp.com/courses/working-with-geospatial-data-in-r/ which goes into what sp objects are (objects of class S4), what this means, and how they behave in quite a bit of detail.

If we don’t force the sp object to a data.frame then we can call its native plot functions

spplot(gwr.model$SDF)

Fancy. We can see how they change. Its a bit slow though. And not really a good idea to compare them like this. However, if we want to then we can map out individual results data again…

gwr.map <- cbind(OA.Census, as.matrix(results))
tm_shape(gwr.map) + tm_fill("localR2")

If you want to see more about how to render the different bits in comparison graphs that use different scales (which is where the above plot call really falls down) then look up gridExtra. (See the end of Practical 10 of https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r for a quick introduction to using this very handy package.)

More information

---
title: "Spatial Analysis"
author: "Kirsty Kitto"
date: "May 26 2018"
output:
  html_notebook: default
  html_document: default
---

Spatial data brings a whole set of new problems with it. A point of particular divergence from our regression models, is brought on by the fact that spatial data is often autocorrelated in the random component. That means that nearby errors are often correlated, and violates one of our core assumptions of regression (about the error terms being independent from one another). We need to adjust our models to account for this problem. 

With spatial data we are often interested in asking questions about things like:

* whether our datapoints are spatially clustered?
* if two types of event have the same type of spatial distribution? (e.g. two different types of crime, or two different diseases)
* whether a contagion lies behind an outbreak of a disease? (i.e. is a new case more likely to occur near one of the existing points in a dataset?)
* is an apparent cluster of accidents just due to the fact that there are lots of people are crossing the road there? 

R has a whole suite of tools for dealing with spatial data. And the visualisation capabilities in particular are very nicely set up. Let's spend a little bit of time looking at mapping before we move onto generating some statistical models for our data. 

If you really want to find out about how to deal with spatial data in R, then this is a great site, full of some lovely tutorials on everything you are likely to want to know: https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r 

I definitely recommend this site! You will need to create an account and log in, but it is definitely worth your while. It has very nice workbooks on Point mapping, kernel smoothing, spatial autocorrelation... almost everything you are likely to know. I have taken a lot of what follows from that site, but this workbook will only be scratching the surface of what is there! (Albeit in one relatively quick to work through workbook as opposed to the 12 Practicals that you will find in that site!)

##Mapping

When we are dealing with spatial data, we can often end up with a number of different types of data and these can be stored in quite complex datatypes. For example, maps often include:

* point data (e.g. specific places where some event occurs)
* line data (data associated with a line, which we get by joining two points e.g. rivers, roads)
* polygon data (data associated with an area, e.g. SA2)
* raster data (data associated with a grid, e.g. latitude and longitude)

The `sp` package is a standard way of dealing with spatial data (stored as objects of type Spatial Polygons). Many packages depend upon it, so it is worth learning about if you are planning to do anything with spatial data. 

Often we will get information about spatial objects (i.e. maps) from what is called an *OGR data source*, which is something that has been constructed according to the Geospatial Data Abstraction Library (GDAL), which is really just a data standard for geospatial data (https://en.wikipedia.org/wiki/GDAL). 

R has some pretty good tools for dealing with this stuff. In particular, the `readOGR()` function in the `rgdal` package, can be used to read in OGR data. Let's load it:

```{r}
library(sp)
library(rgdal)
```

We can use `readOGR` to load up the OGR files that have been made available here: https://data.cdrc.ac.uk/dataset/aa5491c9-cbac-4026-97c9-f9168462f4ac/resource/f8ecc2bc-5313-468d-9653-11f0fd752a7d/download/camdenoa11.zip 

Make sure that you download them now, and move them to the directory where you are running your code! (i.e. the pathname might not work!) Have a look at the files - they should look like this:

*  Camden_oa11.dbf
*  Camden_oa11.prj
*  Camden_oa11.shp
*  Camden_oa11.shx

R knows how to load them all if you specify the path name ("Cambden" for me - I have them in a folder) and the name of the file (i.e. the bit out the front without the filename extension.)

```{r}
Output.Areas<- readOGR("Camden","Camden_oa11")
```

Remember to run `?readOGR` to find out more about this function (and the OGR format).

You could look at the structure this map using `str(Output.Areas)` but its REALLY long! I do recommend you examine the structure of this beasty at least once to get a feeling for what is going on in it. Also - beware - the actual data object itself is even longer! (It takes a lot of information to make a map!) Let's just look at the `head` of this data for now
```{r}
head(Output.Areas)
```

Hmmmm... doesn't tell us much ;) you had better explore it in your console (but I am not going to do that here because it will make us all do too much scrolling to get past it).

The weird structure you get occurs because `sp` objects are stored as S4 objects in R. This is a specific object oriented data format, which is quite complex. You can find out more about all of this in this data camp course https://www.datacamp.com/courses/working-with-geospatial-data-in-r (which is quite good! It is definitely worth your while learning about S4 objects in R, and this is a good way to do it.)

Ok... this thing is of class "SpatialPolygonsDataFrame". That is a complex beast! (Try running ?SpatialPolygonsDataFrame to find out a bit more about *them*.) A cool thing about these objects, is that they have a nice handy plot method:

```{r}
plot(Output.Areas)
```

Its a map of Cambden! (I guess we should have expected that given that the files are called Cambden_* :| - have a look here if you don't believe me: https://en.wikipedia.org/wiki/London_Borough_of_Camden). 

If you want to find some Australian GIS data then you could try these sites:   
- http://www.naturalearthdata.com/downloads/   
- http://openstreetmapdata.com/data 

Ok... so we have our map. What can we do with it? Well... if we had some data then we could join it to our map... Then we could do lots of cool things ;)

Luckily the same site makes census data for the UK available in an easy to use form. You can download it here: https://data.cdrc.ac.uk/dataset/aa5491c9-cbac-4026-97c9-f9168462f4ac/resource/27949d58-8d9b-43b0-b0dd-dfbe47609cbf/download/practicaldata.csv (don't forget to put it in the right place so that R can find it):

```{r}
Census.Data <-read.csv("practicaldata.csv")
```

(Have a look a the data! What is in there?)

Ok. How do we join it? We use a `merge` function that comes with the `sp` package:

```{r}
OA.Census <- merge(Output.Areas, Census.Data, by.x="OA11CD", by.y="OA")
```

(Make sure you look up what I did there using the helpfile... in particular - can you work out what is going on with those `by.x` and `by.y` options?)

Ok... no errors so it looks like the data merged properly. Now what do we have?

```{r}
head(OA.Census)
```

Oooo... it is one of those SPdataframe thingos with 5 variables now (try looking at `OA.Census` and `Output.Areas` in your Environment tab on the right to work out what is different between the two `sp` objects - what has happened?) 

Basically... we have added some data to our map! 5 new variables are now in there about the number of White_Brittish citizens there are in an area, the Low_Occupancy info, how many Unemployed, and their Qualification. Go and look at the site we got the data from to find out more about this dataset (Practical 1 has information about where the data comes from: https://data.cdrc.ac.uk/dataset/aa5491c9-cbac-4026-97c9-f9168462f4ac/resource/09a1093a-74b9-487b-a939-d928f27be612/download/practical1.html). 

Now. Many different coordinate systems are possible. If you are having trouble using GIS data this could well be what is going wrong. Here, we are going to set the coordinate system to the British National Grid (which is the format the data actually came in - we are just making sure, and it is an important thing to learn about):

```{r}
proj4string(OA.Census) <- CRS("+init=EPSG:27700")
```

Cool. A really good place to go if you want to learn more about mapping (as well as some of the R S4 classes like SpatialPolygonsDataFrame and how they work!) is this data camp course: https://campus.datacamp.com/courses/working-with-geospatial-data-in-r. 
Chapters 2 onwards are very nice, and provide a good introduction to the `sp`, `raster` and `rgdal` packages, setting coordinate systems etc. I strongly recommend working through this course if you want to find out more about what we are doing here (we are skimming through!)

Now... we could make our map look nicer. Let's use the `tmap` package, which works quite a bit like ggplot (i.e. it layers the graphics up, so is quite powerful).

```{r}
library(tmap)
```

Have a look at `?tm_shape` and `?tm_fill` and you will start working out how it can be used. For a list of all possible functions try `?tmap`. That Data Camp course I mentioned above  (https://campus.datacamp.com/courses/working-with-geospatial-data-in-r) is the place to go if you want to find out more about how `tmap` works. We don't have time to cover it here. 

Let's try using it! What is the spread of Qualifications acorss Camden?

```{r}
tm_shape(OA.Census) + tm_fill("Qualification") 
```

Nice - you can find out a lot more about mapping options by working through the rest of Practical 5 on https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r (we are going to move onto more statistical things!)

###More information

We are not going to cover mapping in any more detail - you could easily complete a whole course just in this! For more information you could start with the following two resources (which I have already referred to above!)

* https://www.datacamp.com/courses/working-with-geospatial-data-in-r
* https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r (Practical 5)


##Point Pattern Analysis

So... we have a map. Let's put some more data onto it. The same site makes some point data about house prices available via this link: https://data.cdrc.ac.uk/dataset/aa5491c9-cbac-4026-97c9-f9168462f4ac/resource/24c4f527-43c8-4c09-8558-ad6a48e17de5/download/camdenhousesales15.csv

Let's look at it (make sure you ahve downloaded the data from the site! You will need to log in to access it which is why we are not bothering to read it directly):

```{r}
houses <- read.csv("CamdenHouseSales15.csv")
houses
```

Right-o. We have seen things like this before. Let's quickly plot out what looks like the lattitude/longitude data:

```{r}
plot(houses$oseast1m, houses$osnrth1m)
```

Yep - it also looks like Cambden! We can probably just map the two datasets together. If we turn this object (which is just a dataframe) into a SpatialPointsDataFrame (like OA.Census) then we will be able to layer them on one another! We need to be a bit careful about that coordinate frame thing again though (i.e. we need to set it again). 

So. We need to set what the data is to be included (`houses`), what columns contain the x and y coordinates (`houses[,3:4]`), and what projection system we are using (`CRS("+init=EPSG:27700")`).

```{r}
House.Points <-SpatialPointsDataFrame(houses[,8:9], houses, proj4string = CRS("+init=EPSG:27700"))
```

Let's print it just to be sure:

```{r}
House.Points
```


```{r}
# creates a coloured dot map
tm_shape(OA.Census) + tm_borders(alpha=.4) +
tm_shape(House.Points) + tm_dots(col = "Price", palette = "Reds", style = "quantile") 
```

Alright! That's kind of nice! Again. You can keep going, and adding to this map if you check out Practical 6 of https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r, we are going to move onto Kernel smoothing!



### Kernel smoothing


A point pattern is really just a collection of geographical points, that are assumed to be generated by a random process. Let's imagine the series of points in an $(x,y)$ coordinate system, then we can imagine the vector of $n$ observations:
${\bf{x}_i}=\{(x_1, y_1), (x_2,y_2)\dots, (x_n,y_n)\}$ for some variable of interest. 

One way to analyse these type of point distributions is to perform kernel smoothing, where you replace each point $x_i$ with a *kernel*, which is just a simple localised function $f({\bf x_i})$ centred on each of our points. We can then and add up all these kernels to get a smooth intensity function, which describes how the distribution of points behaves over space. You could really just think of this as a "bump averaging process". 

To visualise this process, imagine that you have some blotter paper, and you are putting  a dot of ink at each point over your paper. The ink will spread out, with a darker spot in the centre. If you do this for a lot of different dots, then they will start to overlap, getting darker. That is your intensity function!

You need to decide things like what type of kernel function to use, what shape it has etc. but the density function in `spatstat` has some pretty good ones for you to use if you need something different. (This data camp course gives a good introduction: https://campus.datacamp.com/courses/spatial-statistics-in-r/). Here, we are going to use a more modern package that couples very nicely with our tmap object. 

The `kernelUD` function runs a straightforward estimation of our house price data:

```{r}
library(adehabitatHR)
# runs the kernel density estimation,look up the function parameters for more options
kde.output <- kernelUD(House.Points, h="href", grid = 1000)
plot(kde.output)
```

Interesting. Can we put it on our map? First we have to convert it to a `raster` object, and set its coordinate system, so that we can then map it onto our previous map:

```{r}
library(raster)
#convert our kernel density map to a raster object
kde <- raster(kde.output)
# set the coordinate projection of our object to British National Grid (the one we have been using all along)
projection(kde) <- CRS("+init=EPSG:27700")
#map it!
tm_shape(kde) + tm_raster("ud")
```

We can zoom in to match it to our previous map (if you have forgotten what that was then go back up and remind yourself of what the `Output.Areas` map was!)

```{r}
masked_kde <- mask(kde, Output.Areas)
plot(masked_kde) # a quick sanity check!
```
(There is normally a good bet that the `plot` function will exist for things of these type, so it is worth a try if you don't know what to do with something... that is pretty much what we did here ;)

Nice! Looks like things are working... Let's neaten it up a bit with some of the functionality built for `tmap`! 

```{r}
library(tmaptools) # provides a set of tools for processing spatial data

# creates a bounding box based on the extents of the Output.Areas polygon (to make our map smaller)
bounding_box <- bb(Output.Areas)

# maps the raster within the bounding box
tm_shape(masked_kde, bbox = bounding_box) + tm_raster("ud")
```

But if we want to get something nicer still we can keep on fiddling... I will leave it for you to look up all the calls in the help!

```{r}
tm_shape(masked_kde, bbox = bounding_box) + tm_raster("ud", style = "quantile", n = 100, legend.show = FALSE, palette = "YlGnBu") +
tm_shape(Output.Areas) + tm_borders(alpha=.3, col = "white") +
tm_layout(frame = FALSE)
```

Nice! That tells us a lot about how house prices are changing as we move around in Camden. (Any groups think that kind of thing might be useful? ;)

###More information

* Practical 8 of https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r
* A very detailed online book: http://www.spatialanalysisonline.com/HTML/index.html?introduction_and_terminology.htm
* https://campus.datacamp.com/courses/spatial-statistics-in-r/ A data camp course that has some good practice exercises, although I find it a bit longwinded in some of the examples. 
* Chapter 6 of Brunsdon, C., & Comber, L. (2015). An introduction to R for spatial analysis and mapping. Sage. (Available in library: http://find.lib.uts.edu.au/?R=OPAC_b2914294)


##Spatial Attribute Analysis

Quite often we get data in regions (e.g. LGA, SA4 etc.) instead of as a point function. This is so common that there are a whole range of techniques for dealing with it. 

###Spatial Autocorrelaton

Spatial autocorrelation is an extension of temporal autocorrelation (which is covered in the workbook on Time Series). It is a bit more complex though, as spatial objects  have (at least) two dimensions and, like we have seen above, the regions can have complex shapes. This means that it may not be obvious how to determine what is "near", and there are lots of different measures of this (we are just going to consider one, but you will find lots of resources about other methods as you work through this section).

A spatial autocorrelation measures how distance influences a particular variable. Basically, we expect things that are close together to be similar. So for example, a contagious disease is more likely to spread to people who live close to each other, or people tend to cluster together in suburbs with people like them. Can we think of a variable that might do this in our dataset? What about qualifications? You might expect that suburbs with a lot of people with degrees will tend to cluster together perhaps... 

 Let's go back to our map, and adjust it a bit to see if this might be occurring in Camden. 

```{r}
tm_shape(OA.Census) + tm_fill("Qualification", palette = "Reds", style = "quantile", title = "% with a Qualification") + tm_borders(alpha=.4)  
```

Looks like this might be happening actually. How could we test whether this is statistically significant though? We will use the `spdep` package to look at spatial autocorrelation. 

```{r}
library(spdep)
```

First, we have to find out which polygons in our map are neighbours with one another. We are going to use one particular method here:

```{r}
neighbours <- poly2nb(OA.Census, queen = FALSE)
neighbours
```

By setting `queen = FALSE` we have set up our map to use a definition of neighbours where more than one shared point is required for two  regions to be defined as neighbouring one another. (Look up more options in `?poly2nb` and you can find out a lot more details about this in these slides: http://www.bias-project.org.uk/ASDARcourse/unit6_slides.pdf).

We can plot this list of neighbours on our map

```{r}
plot(OA.Census, border = 'lightgrey')
plot(neighbours, coordinates(OA.Census), add=TRUE, col='red')
```

Ok. Now we can run an autoregression. First we have to convert our neighbours object to the right kind of data object, one that has weights:

```{r}
listw <- nb2listw(neighbours)
listw
```

Compare `listw` and `neighbours`. See that our new list has extra information about weights? We have specified what type of coding scheme we are going to use, that is, how neighbours with no links should be treated etc. See the preliminary details at `?nb2listw`, but if you want to find out more about this process then you could check out slides 23-26 of this set: http://www.bias-project.org.uk/ASDARcourse/unit6_slides.pdf (Lots of other details in here too!)

Ok. We can now run a test for spatial autocorrelation! We are going to use Moran's test, which creates a correlation score between -1 and 1. 
Have a look at `?moran` to find out more about Moran's I works, but basically it measures spatial autocorrelation based on both feature locations and feature values simultaneously. (See the wikipedia page for a bit more info: https://en.wikipedia.org/wiki/Moran%27s_I) Enough already - lets do the test! Are the regions autocorrelated?

```{r}
moran.test(OA.Census$Qualification, listw)
```

Ok. Our p-value is significant and our I statistic is positive (i.e correlated). 
So the regions seem to be somewhat correlated, as we expected to see from the red image above where we saw that regions with more qualifications tended to cluster together. This page has more details about interpreting Moran's I, athough it is for a different package from R: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Spatial_Autocorrelation_Global_Moran_s_I_works/005p0000000t000000/

This is what is known as a *global* autocorrelation test. We are looking over the whole map to measure spatial correlation. 

We can also do a local test for autocorrelation. Let's start by looking at how the Qualifications are changing against *spatially* lagged values. 
If you want to find out more about how this works, then this page is a great place to start: http://rspatial.org/analysis/rst/3-spauto.html

```{r}
moran <- moran.plot(OA.Census$Qualification, listw = nb2listw(neighbours, style = "W"))
```

Hmmm... seems like there might be some sort of a positive relationship? There are quite a few outliers, perhaps we can learn a bit more. The `localmoran` function is what we need (check the help file)

```{r}
local <- localmoran(x = OA.Census$Qualification, listw = nb2listw(neighbours, style = "W"))
head(local)
```

Right... so this is producing a whole set of values for each region! We could add this to our shapefile and then we would be able to map it... 

```{r}
# binds results to our polygon shapefile
moran.map <- cbind(OA.Census, local)

# maps the results
tm_shape(moran.map) + tm_fill(col = "Ii", style = "quantile", title = "local moran statistic") 
```

So this shows us the variations in autocorrelation across space. But are they significant? We could try mapping the p-values the same way. First we need to work out what they are called:

```{r}
names(moran.map@data)
```


```{r}
# maps the p-values
tm_shape(moran.map) + tm_fill(col = "Pr.z...0.", style = "fixed", breaks=c(0.001,0.01,0.05,0.1,0.2,1), title = "p-values") 
```

Interesting... Can you interpret this? Is it what you expected? (Make sure you look at both the local moran statistics and the p-values in interpreting your results.)

In practice you should really be comparing this to a randomly generated Markov Chain approach. The Data Camp course on https://campus.datacamp.com/courses/spatial-statistics-in-r/ goes into this in a lot of detail. 

####More information

* Practical 9 of https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r is very brief, and does not go into a lot more details than what we have covered. You would do well to use some of the extra resources below as well to find out more. 
* A nice workbook on spatial autocorrelation http://rspatial.org/analysis/rst/3-spauto.html
* A whole online book on Geospatial Analysis, which has a section on spatial autocorrelation: http://www.spatialanalysisonline.com/HTML/?spatial_autocorrelation.htm
* This set of slides goes into a lot more details about the various ways of calculating distance, how the `spdep` functions work etc. Worth checking out if you want to know more: http://www.bias-project.org.uk/ASDARcourse/unit6_slides.pdf
* https://campus.datacamp.com/courses/spatial-statistics-in-r/
* Chapter 7 of Brunsdon, C., & Comber, L. (2015). An introduction to R for spatial analysis and mapping. Sage. (Available in library: http://find.lib.uts.edu.au/?R=OPAC_b2914294)
* A spatial analysis completed by a past STDS student for Assessment 3: https://16-6143.ca.uts.edu.au/spatial-investigation-of-bom-data-assumption-used-in-earlier-asthma-hospiatlisation-investigation/


###Geographically Weighted Regression

Ok... final step! Can we work out how to use areas in a regression model? Yes we can! Geographically Weighted Regression (GWR) can be used to identify how regression coefficients may vary across the study area. (So it relaxes assumptions about stationarity for spatial data.)

####Standard linear regression over a geographical dataset

First, let's try running just a normal regression model, that tries to explain Qualifications in terms of the other variables in the census dataset:

```{r}
lm.model <- lm(OA.Census$Qualification ~ OA.Census$Unemployed+OA.Census$White_British)
summary(lm.model)
```

Its not a bad model. Explains 46.5% of the variance, good p-values... what do the residuals look like?

```{r}
plot(lm.model)
```

Not brillant, but kind of ok. Is there a spatial distribution to them though? (There should be - we just found it using the spatial autocorrelation stuff!)

```{r}
resids<-residuals(lm.model)

map.resids <- cbind(OA.Census, resids) 
# we need to rename the column header from the resids file - in this case its the 6th column of map.resids
names(map.resids)[6] <- "resids"

# maps the residuals using the quickmap function from tmap
qtm(map.resids, fill = "resids")

tm_shape(map.resids) + tm_fill("resids") 
```

Seems like there might be a bit of a spatial dependence there. 

#####GWR

Ok. Can we take account of spatial variables along with our regression model? First we need to work out how to model the kernel. We will use 

```{r}
library("spgwr")

#calculate kernel bandwidth
GWRbandwidth <- gwr.sel(OA.Census$Qualification ~ OA.Census$Unemployed+OA.Census$White_British, data=OA.Census,adapt=T)
```

Ok. Now we can use this value of the bandwidth in our model (don't forget to look up the relevant help pages to find out what we are doing here!)

```{r}
#fit the gwr model (note it has the same formula as before)
gwr.model = gwr(OA.Census$Qualification ~ OA.Census$Unemployed+OA.Census$White_British, data = OA.Census, adapt=GWRbandwidth, hatmatrix=TRUE, se.fit=TRUE) 

#OK. What did it do?
gwr.model
```

So... what happened? Let's compare a few things. First - what are the components of the models?

```{r}
names(lm.model)
names(gwr.model)
```

What if we look at the coefficients for the global model (i.e. the standard linear regression fit) again?

```{r}
lm.model$coefficients
```

How do these compare with the ranges of the intercept and the model coefficients in the summary for gwr.model? What has changed by us running a local regression? They are not completely different, but if we compare the Median values then we see that they have changed a little bit.

This is because in our GWR, we have sets of these intercept and coefficient values for each map region (along with a whole heap of information about the error of the fit at that point) . We can look at them easily if we force the part of the model with the fit.points etc. to a data.frame

```{r}
results <-as.data.frame(gwr.model$SDF)
results
```

Ha. So the coefficients are changing for each council region, along with the intercepts, and some other bits and pieces. 

Just as a by the way... we are using the @ here to refer to a slot in the S4 object that we know is a data.frame (because of the help pages). Here is another way to force out the same information that is probably a bit more familiar:

```{r}
gwr.model$SDF@data
```

Exactly the same right? If you want to find out more about S4 functions then you should go and do Chapter 2 of this Data Camp course: https://campus.datacamp.com/courses/working-with-geospatial-data-in-r/ which goes into what `sp` objects are (objects of class S4), what this means, and how they behave in quite a bit of detail. 

If we don't force the `sp` object to a data.frame then we can call its native plot functions

```{r}
spplot(gwr.model$SDF)
```

Fancy. We can see how they change. Its a bit slow though. And not really a good idea to compare them like this. However, if we want to then we can map out individual results data again...

```{r}
gwr.map <- cbind(OA.Census, as.matrix(results))
tm_shape(gwr.map) + tm_fill("localR2")
```

If you want to see more about how to render the different bits in comparison graphs that use different scales (which is where the above `plot` call really falls down) then look up `gridExtra`. (See the end of Practical 10 of https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r for a quick introduction to using this very handy package.)

####More information

* Practical 10 of https://data.cdrc.ac.uk/tutorial/an-introduction-to-spatial-data-analysis-and-visualisation-in-r
* There is a vignette on GWR: `vignette("GWR")` It is not very detailed but does have a list of references where you can find out more about the technique. 
* http://rspatial.org/analysis/rst/6-local_regression.html
* There are lots of tutorials and other resources around on GWR. Try this one: https://rpubs.com/chrisbrunsdon/101305, this one: https://rstudio-pubs-static.s3.amazonaws.com/44975_0342ec49f925426fa16ebcdc28210118.html, or these StackExchange pages: https://gis.stackexchange.com/

